Azure AI Foundry: The AI App and Agent Factory

Session Date: May 19, 2025
Duration: 1 hour
Venue: Build 2025 Conference - BRK155
Speakers: Yina Arenas (VP Azure AI Foundry, Microsoft), Scott Hanselman (Vice President Developer Community, Microsoft)
Link: [Microsoft Build 2025 Session BRK155]

Executive Summary

This flagship Azure AI Foundry session showcases how Microsoft’s comprehensive AI platform transforms real-world workflows through an ambitious live demonstration: automating Scott Hanselman’s 20-year podcast production process. Yina Arenas and Scott Hanselman demonstrate the platform’s three core pillars�models, agents, and observability�through extensive live coding, revealing how developers can build production-ready AI applications with integrated signals loops that connect model choice, knowledge retrieval, fine-tuning, orchestration, and memory.

Key Topics Covered

?? 1. The Real-World AI Challenge: Hanselminutes Podcast Factory

The 20-Year Corpus Challenge

Scott’s Opening Context: > “I’ve got this podcast that I do on the side… I just published episode 997… There’s over 500 hours of material. I was actually almost delisted by a podcast directory when a young person sent an email saying that they believed that the podcast itself was AI generated because they couldn’t conceive about that amount of work.”

The Automation Philosophy:

“Only take away the parts of the job that suck” - Preserve creative work, automate toil
Human-in-the-loop approach - Maintain editorial control and quality oversight
Real-world constraints - Small team (Scott + Mandy), minimal budget, maximum efficiency

Production Pain Points and Toil

Weekly Administrative Burden:

Show notes generation - “I’m notorious for saying I’m going to put that in the show notes and then you’ll never hear from me again”
Guest biography research - Manual Wikipedia searches and bio compilation
Link verification - Ensuring references are accurate and functional
Transcript generation - Converting audio to searchable text content
Scheduling coordination - Email back-and-forth with potential guests

Time Investment:

2-3 hours weekly of administrative work for 20+ years
Goal reduction to 10-15 minutes with AI assistance
Scalability concern - Process must work for next 1,000 episodes over 20 years

?? 2. Azure AI Foundry: Three-Pillar Architecture

Platform Philosophy

Yina’s Definition: > “The open, flexible, and secure platform that enables you as a developer to infuse AI in every single application that you do. Whether it is a brand new application, or something that you’ve been working for a while.”

Three Core Pillars:

Azure AI Foundry
??? Models: Extensive catalog with intelligent selection
??? Agents: Agentic platform for orchestration and memory
??? Observability: Production monitoring and evaluation tools

Developer Journey:

Idea to code - Rapid prototyping and experimentation
Code to production - Enterprise-ready deployment and monitoring
Integrated signals loop - Continuous improvement through observability

?? 3. Models: The Foundation Layer

Model Explosion and Selection Challenge

Historical Context: > “Two years ago the world was actually very simple. We had just a few foundational models. But the range of change has been unprecedented over the last couple of years.”

Current Ecosystem:

10,000+ models available through Hugging Face partnership
Daily updates with latest model releases and capabilities
Multiple dimensions - Provider, industry, capabilities, deployment type, task support

Live Demo: Model Catalog Navigation

Intelligent Discovery Features:

Built-in agent assistance - “Best model for protein research” queries
Leaderboard comparisons - Quality, safety, cost, throughput benchmarks
Trade-off visualization - Quality vs. cost, quality vs. throughput charts
Dynamic evaluation - Real benchmarks, not hard-coded rankings

New Model Providers:

OpenAI family - GPT-4.1, nano, mini models
DeepSeek, Mistral, Grok - Expanded language model options
Meta, Black Forest Lab - Specialized and research models
Foundry Labs - Microsoft Research model integration

Model Router: Intelligent Cost Optimization

Revolutionary Approach: > “Model-router takes the selection toil from your head… it is a router on top of like the best models.”

Live Demonstration Results:

Simple Query: "Where is Hanoi?" ? Nano model (cost-efficient)
Complex Planning: "Plan a trip with constraints" ? Mini model (balanced capability)
Complex Reasoning: Multi-step problems ? Full reasoning model

Performance Benefits:

60% cost reduction compared to always using GPT-4.1
1-2% accuracy reduction - Minimal quality impact
Environmental efficiency - Reduced computational resources
Automatic optimization - No manual model selection required

Fine-Tuning and Distillation Success

Hanselminutes Model Economics:

Original Processing Cost: $100 for 1,000 episodes
Fine-tuned Model Cost: $1.50 for next 1,000 episodes
Cost Reduction: 98.5% savings through specialization

Technical Achievement:

Whisper-based transcription for initial corpus processing
Distilled specialized model - Only does show notes, no general knowledge
Developer tier announcement - Reduced hosting fees for experimentation

?? 4. Foundry Local: Hybrid AI Architecture

Local AI Capabilities

On-Device Processing Demo:

Phi-4 mini model running locally on laptop GPU
Biographical research combining Perplexity (cloud) + local summarization
GPU memory monitoring - Real-time resource utilization display
Instant startup/shutdown - foundry service stop releases resources

Live Demo: Biography Generation

Hybrid Processing Workflow:

Cloud research phase - Perplexity gathers comprehensive data
Local processing - Phi-4 mini summarizes and formats on GPU
Quality control - Immediate feedback on accuracy and relevance

Demonstration Results:

First attempt - Incorrect biographical details (wrong Yina)
Second attempt - Accurate professional summary and background
Real-time correction - Shows AI fallibility and recovery

?? 5. Agents: Beyond Tools to Orchestrated Intelligence

Defining the Agent Paradigm

Scott’s Practical Definition: > “AI’s don’t have feet and they don’t have hands and they can’t do stuff… I feel like an agent might be able to call multiple tools, but an agent has a task where it’s like, take this and do it and then let me know when you’re finished.”

Yina’s Technical Framework: > “The difference is now you’re having an LLM that is helping you drive the control flow of the program. It’s making some of those decisions for you.”

Agent Capabilities:

Tool orchestration - Coordinated use of multiple capabilities
Memory management - Context preservation across interactions
Multi-modal inputs - Text, speech, images, video processing
Agent-to-agent communication - Collaborative task execution

Agent Service: General Availability Announcement

Enterprise-Ready Platform Features:

Declarative agent creation - Configuration-based development
Cloud execution - Managed scaling and resource allocation
Enterprise integration - Custom file storage, network isolation
Authentication support - API token management and security
Built-in connectors - Fabric, SharePoint, Bing, Azure AI Search

Interoperability Standards:

A2A (Agent-to-Agent) - Cross-platform agent communication
MCP (Model Context Protocol) - Standardized tool integration
LangChain and CrewAI - Popular framework compatibility
OpenAI APIs - Assistants and Response API support

Live Demo: Specialized Agent Creation

Show Notes Agent Implementation:

Fine-tuned nano model - Specialized for podcast show note generation
Cost optimization - Smallest possible model for specific task
Quality preservation - Maintains accuracy while reducing resource usage
Visual Studio integration - One-click export to development environment

Agent Catalog Ecosystem:

Microsoft Build templates - Ready-to-use agent configurations
Partner contributions - Third-party specialized agents
Stanford healthcare example - Academic research integration

??? 6. Multi-Agent Orchestration: The Podcast Factory

Voice Orchestrator Agent

Natural Language Interface: > “Hey, are you there and can you give me a list of the things that you can do to help me with the podcast production?”

Demonstrated Capabilities:

Guest intake collaboration - Brainstorming and outreach strategies
Bio generation - Automated speaker biography creation
Transcript processing - Audio-to-text conversion and formatting
Show notes generation - Structured content creation
Link verification - URL validation and accuracy checking

Advanced Search and Knowledge Retrieval

Agentic Retrieval Innovation: > “This is not just like a one shot… this is what we call agentic retrieval on Azure AI Search.”

Complex Query Processing:

Query: "Over the last 20 years are there any recurring ideas, 
or universal truths in the last 1,000 episodes that have emerged?"

Processing:
??? Query decomposition into multiple sub-questions
??? Parallel information retrieval across corpus
??? Synthesis of findings across temporal patterns
??? Thematic analysis: Pragmatism, mentorship, community

Content Understanding and Schema Definition

Structured Data Extraction:

Speaker identification - Multiple voice recognition and separation
Topic extraction - Thematic categorization of episode content
Quote identification - Notable statements and insights
Title and summary generation - Automated metadata creation

Technical Implementation:

Azure AI Content Understanding service integration
Custom schema definition for podcast-specific fields
Multi-speaker detection - Including advertisement voice differentiation

?? 7. Semantic Kernel and AutoGen Integration

Framework Convergence Announcement

Strategic Platform Unification: > “AutoGen has been coming in as an agentic framework from our Microsoft research organization. Semantic Kernel is the one that we’ve been recommending for using in production… we’re bringing them together into like one agentic framework.”

Technical Implementation:

Shared runtime - Common execution environment
Production focus - Semantic Kernel for enterprise deployment
Research innovation - AutoGen experimental capabilities
Framework convergence - Single unified platform roadmap

Live Code Demonstration

Multi-Agent Workflow Definition:

// Semantic Kernel function integration
var contentAgent = kernel.CreateFunction("ProcessShowNotes");
var linkVerifier = kernel.CreateFunction("VerifyLinks");
var summarizer = kernel.CreateFunction("CreateSummary");

// Workflow orchestration
var workflow = new SequentialWorkflow()
    .AddStep(transcriptAnalysis)
    .AddStep(contentGeneration) 
    .AddStep(linkVerification)
    .AddStep(summaryGeneration);

Visual Workflow Management:

Mermaid chart generation - Visual workflow representation
YAML abstraction - Configuration without direct YAML editing
Process visualization - Clear understanding of agent interactions

?? 8. Observability: Production-Ready AI Applications

Comprehensive Monitoring Stack

Development Lifecycle Support: > “We have this set of tools to support your entire development lifecycle. Whether you are like starting to experiment, or like thinking about going to production.”

OpenTelemetry Integration: Scott’s Context: > “OTel is one of the biggest things… it’s effectively distributed log files and when you have large complicated distributed systems like this, you want all that to come to one place.”

Monitoring Capabilities:

Distributed tracing - End-to-end request flow visibility
Azure Monitor integration - Enterprise monitoring platform compatibility
Custom dashboards - Grafana and existing observability tools
Real-time metrics - Performance and cost optimization insights

AI-Specific Evaluation Framework

Agent-Centric Observability:

Conversation threads - Complete dialogue context and memory
Agent runs - Individual interaction turn analysis
Input/output tracking - Complete data flow monitoring
Metadata collection - Decision-making process transparency

Built-in Evaluation Metrics:

Intent resolution - Task completion accuracy assessment
Relevance scoring - Response quality and context alignment
Code vulnerability detection - Security risk identification
Task adherence - Agent focus and scope compliance
Indirect jailbreak protection - Security against malicious input

CI/CD Pipeline Integration

DevOps Automation:

GitHub Actions integration - Automated evaluation in deployment pipeline
Quality gates - Automated testing before production release
Performance tracking - Historical quality and performance metrics
Regression detection - Automatic identification of capability degradation

?? 9. Enterprise Security and Governance

End-to-End Security Architecture

Identity and Access Management:

Entra ID integration - Agents as first-class directory entities
Specific agent identities - Individual permissions and governance
Principle of least privilege - Minimal required access rights
Audit trail - Complete action and decision logging

Data Protection and Compliance

Microsoft Purview Integration:

Data labeling - Automatic content classification
Data protection - Encryption and access control
Compliance monitoring - Regulatory requirement adherence
Microsoft Defender - Security threat detection and response

Grounding and Restriction Controls

Business Context Enforcement: > “It can’t be overstated how important grounding is… I don’t want my chatbot to talk about things that aren’t this… You don’t want your coffee shop chatbot offering relationship advice.”

Implementation Strategies:

Domain-specific models - Trained only for specific business contexts
Knowledge source restriction - Limited to approved information sources
Custom search scopes - Controlled web access and information retrieval
Response filtering - Output validation and appropriateness checking

Live Demonstration Results

Podcast Factory End-to-End Processing

Real-Time Session Recording:

Complete workflow execution - From raw recording to finished show notes
Multi-agent coordination - Seamless handoff between specialized agents
Quality validation - Human oversight at critical decision points
Production-ready output - Immediately usable show notes and transcripts

Generated Content Quality

Show Notes Output:

# Notable Quotes
- "Imposture syndrome means you're doing something okay"
- "Thoughts are not facts" (attributed to guest)

# Key Takeaways  
- Model router provides 60% cost savings
- Grounding essential for business-focused AI
- Human-in-the-loop maintains quality control

# Resources Mentioned
- BRK155 Build Session
- Azure AI Foundry (AI.Azure.com)
- Scott's YouTube Channel [MISSING LINK - FLAGGED]
- Hanselminutes Episode 881 with Raji

Technical Performance Metrics

Cost Optimization Results:

Model router efficiency - 60% cost reduction with 1-2% accuracy loss
Fine-tuning economics - 98.5% cost reduction for specialized tasks
Local processing - Zero cloud costs for summarization tasks
Resource efficiency - Intelligent model selection based on query complexity

Agent Orchestration Success

Multi-Agent Workflow:

Transcript generation - Whisper-based audio processing
Content analysis - Topic extraction and speaker identification
Show notes creation - Structured content generation
Link verification - URL validation and accuracy checking
Summary generation - Final output compilation

Session Highlights

“I feel very strongly about this work and as a human who created some art, I want to make sure that I’m going to be using AI in a way that only takes away the parts of the job that suck.” - Scott Hanselman

“We took this process that Scott has been doing for 20 years… to erase that toil at scale.” - Yina Arenas

“Model-router takes the selection toil from your head… you can get up to 60% price, like, off in the price.” - Yina Arenas

“The difference is now you’re having an LLM that is helping you drive the control flow of the program. It’s making some of those decisions for you.” - Yina Arenas

“What factory are you going to build? How are you going to like bring that investment to be, instead of return of investment, return on your effort?” - Yina Arenas

Implementation Guide

Getting Started with Azure AI Foundry

1. Model Selection and Optimization Strategy

**Model Catalog Navigation:**

- Use built-in agent for domain-specific model recommendations
- Leverage leaderboards for cost/quality/throughput optimization
- Implement model router for automatic cost optimization
- Consider fine-tuning for specialized use cases

**Cost Optimization Approach:**

- Start with model router for automatic selection
- Measure baseline costs and performance
- Identify opportunities for fine-tuning
- Implement distilled models for specific tasks

2. Agent Development Workflow

// Agent Service Implementation Pattern
var agent = await foundry.CreateAgentAsync(new AgentConfig
{
    Name = "ShowNotesAgent",
    Model = "gpt-4.1-nano-finetuned", 
    Tools = { "content-understanding", "link-verification" },
    Knowledge = { "podcast-corpus", "guest-database" },
    Instructions = "Generate structured show notes with quotes and links"
});

// Multi-agent orchestration with Semantic Kernel
var workflow = kernel.CreateWorkflow()
    .AddAgent(transcriptAgent)
    .AddAgent(contentAgent) 
    .AddAgent(linkAgent)
    .AddAgent(summaryAgent);

3. Observability and Monitoring Setup

**Production Monitoring:**

- Enable OpenTelemetry tracing across agent workflows
- Configure Azure Monitor integration for dashboards
- Set up automated evaluation pipelines in CI/CD
- Implement quality gates and regression detection

**Evaluation Framework:**

- Define custom evaluation metrics for business context
- Implement A/B testing for agent improvements
- Monitor cost and performance trends over time
- Set up alerting for quality degradation

Best Practices for Production Deployment

Agent Design Principles

Single responsibility - Create focused agents for specific tasks
Composable workflows - Design agents to work together seamlessly
Human-in-the-loop - Maintain oversight at critical decision points
Fail gracefully - Implement fallback strategies and error handling

Security and Compliance

Principle of least privilege - Grant minimal required permissions
Data classification - Use Purview for automatic content labeling
Access control - Implement Entra ID integration for agent identities
Audit trails - Maintain complete logs of agent actions and decisions

Cost Management

Model right-sizing - Use smallest model capable of required quality
Router implementation - Automatic model selection based on query complexity
Local processing - Leverage on-premises resources for appropriate tasks
Usage monitoring - Track costs and optimize based on actual usage patterns

Advanced Applications

Content Creation Automation

Podcast Production Pipeline:

Guest sourcing - AI-powered guest recommendation and outreach
Interview preparation - Automated research and question generation
Content processing - Transcription, summarization, and metadata extraction
Publishing workflow - Social media content and promotional material generation

Enterprise Knowledge Management

Large Corpus Processing:

Historical content analysis - Pattern recognition across years of material
Semantic search - Natural language queries across comprehensive datasets
Content synthesis - Multi-source information combination and analysis
Trend identification - Emerging themes and topic evolution tracking

Business Impact and ROI

Productivity Transformation

Hanselminutes Case Study Results:

Time savings - From 2-3 hours to 10-15 minutes weekly
Cost efficiency - 98.5% reduction in processing costs through fine-tuning
Quality maintenance - Human oversight preserves editorial standards
Scalability - Process designed for next 1,000 episodes over 20 years

Enterprise Adoption Metrics

Platform Usage Statistics:

70,000+ customers using Azure AI Foundry for organizational AI
10,000+ users of Azure AI Foundry Agent Service
Millions of agents created for business process automation
Production deployments across Gainsight, Saifr, Heineken, and more

Resources and Further Learning

Official Platform Access

Azure AI Foundry - Main platform access and documentation
Model Catalog - Browse 10,000+ available models
Agent Service - Create and manage AI agents
Visual Studio Code Extension - Development tools and integration

Technical Documentation

Semantic Kernel - Agent orchestration framework
AutoGen - Multi-agent conversation framework
OpenTelemetry Integration - Monitoring and observability
Security and Compliance - Enterprise governance features

About the Speakers

Yina Arenas
VP Azure AI Foundry
Microsoft
Leads Microsoft’s strategy for AI model ecosystems, enterprise AI agents, and AI developer experience. Platform builder whose innovations reach nearly a billion users, redefining how businesses scale with AI.

Scott Hanselman
Vice President, Developer Community
Microsoft
Developer for 30+ years, blogger for 20 years at hanselman.com. Nearly 1,000 episodes of Hanselminutes podcast and 750+ episodes of Azure Friday. Author and speaker to over one million developers worldwide.

This flagship session demonstrates Azure AI Foundry as Microsoft’s comprehensive platform for building production-ready AI applications, combining extensive model catalogs, sophisticated agent orchestration, and enterprise-grade observability into a unified development experience that transforms real-world business processes through intelligent automation.